9 research outputs found

    ACOTES project: Advanced compiler technologies for embedded streaming

    Get PDF
    Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

    Improved Loop Tiling based on the Removal of Spurious False Dependences

    No full text
    Selected for presentation at the HiPEAC 2013 Conf.International audienceTo preserve the validity of loop nest transformations and parallelization, data-dependences need to be analyzed. Memory dependences come in two varieties: true dependences or false dependences. While true dependences must be satisfied in order to preserve the correct order of computations, false dependences are induced by the reuse of a single memory location to store multiple values. False dependences reduce the degrees of freedom for loop transformations. In particular, loop tiling is severely limited in the presence of these dependences. While array expansion removes all false dependences, the overhead on memory and the detrimental impact on register-level reuse can be catastrophic. We propose and evaluate a compilation technique to safely ignore a large number of false dependences in order to enable loop nest tiling in the polyhedral model. It is based on the precise characterization of interferences between live range intervals, and it does not incur any scalar or array expansion. Our algorithms have been implemented in the Pluto polyhedral compiler, and evaluated on the PolyBench suite

    Polyhedral-Model Guided Loop-Nest Auto-Vectorization

    Get PDF
    International audienceOptimizing compilers apply numerous inter- dependent optimizations, leading to the notoriously difficult phase-ordering problem -- that of deciding which trans- formations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient explo- ration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, including vectorization. The low-level, target-specific aspects of vectorization for fine-grain SIMD has so far excluded it from being part of the polyhedral framework. In this paper we examine the interactions between loop transformations of the polyhedral framework and subsequent vectorization. We model the performance impact of the dif- ferent loop transformations and vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling facilitates efficient exploration and educated decision making to best apply various polyhedral loop transformations while considering the subsequent effects of different vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2.0x on average over traditional innermost-loop vectorization on PowerPC970 and Cell-SPU SIMD platforms
    corecore